25 research outputs found

    On the Performance of Latent Semantic Indexing-based Information Retrieval

    Get PDF
    Conventional vector based Information Retrieval (IR) models, Vector Space Model (VSM) and Generalized Vector Space Model (GVSM), represents documents and queries as vectors in a multidimensional space. This high dimensional data places great demands for computing resources. To overcome these problems, Latent Semantic Indexing (LSI): a variant of VSM, projects the documents into a lower dimensional space, computed via Singular Value Decomposition. It is stated in IR literature that LSI model is 30% more effective than classical VSM models. However statistical significance tests are required to evaluate the reliability of such comparisons. But to the best of our knowledge significance of performance of LSI model is not analyzed so far. Focus of this paper is to address this issue. We discuss the tradeoffs of VSM, GVSM and LSI and empirically evaluate the difference in performance on four testing document collections. Then we analyze the statistical significance of these performance differences

    Improving Accuracy of Intrusion Detection Model Using PCA and optimized SVM

    Get PDF
    Intrusion detection is very essential for providing security to different network domains and is mostly used for locating and tracing the intruders. There are many problems with traditional intrusion detection models (IDS) such as low detection capability against unknown network attack, high false alarm rate and insufficient analysis capability. Hence the major scope of the research in this domain is to develop an intrusion detection model with improved accuracy and reduced training time. This paper proposes a hybrid intrusiondetection model by integrating the principal component analysis (PCA) and support vector machine (SVM). The novelty of the paper is the optimization of kernel parameters of the SVM classifier using automatic parameter selection technique. This technique optimizes the punishment factor (C) and kernel parameter gamma (γ), thereby improving the accuracy of the classifier and reducing the training and testing time. The experimental results obtained on the NSL KDD and gurekddcup dataset show that the proposed technique performs better with higher accuracy, faster convergence speed and better generalization. Minimum resources are consumed as the classifier input requires reduced feature set for optimum classification. A comparative analysis of hybrid models with the proposed model is also performed

    Synthetic Data for Feature Selection

    Full text link
    Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection algorithms. Synthetic datasets allow for precise evaluation of selected features and control of the data parameters for comprehensive assessment. The proposed datasets are based on applications from electronics in order to mimic real life scenarios. To illustrate the utility of the proposed data we employ one of the datasets to test several popular feature selection algorithms. The datasets are made publicly available on GitHub and can be used by researchers to evaluate feature selection algorithms

    Exploring Attributes with Domain Knowledge in Formal Concept Analysis

    Get PDF
    Recent literature reports the growing interests in data analysis using FormalConceptAnalysis (FCA), in which data is represented in the form of object and attribute relations. FCA analyzes and then subsequently visualizes the data based on duality called Galois connection. Attribute exploration is a knowledge acquisition process in FCA, which interactively determines the implications holding between the attributes. The objective of this paper is to demonstrate the attribute exploration to understand the dependencies among the attributes in the data. While performing this process, we add domain experts’ knowledge as background knowledge. We demonstrate the method through experiments on two real world healthcare datasets. The results show that the knowledge acquired through exploration process coupled with domain expert knowledge has better classification accuracy

    A study and analysis of recommendation systems for location-based social network (LBSN) with big data

    Get PDF
    Recommender systems play an important role in our day-to-day life. A recommender system automatically suggests an item to a user that he/she might be interested in. Small-scale datasets are used to provide recommendations based on location, but in real time, the volume of data is large. We have selected Foursquare dataset to study the need for big data in recommendation systems for location-based social network (LBSN). A few quality parameters like parallel processing and multimodal interface have been selected to study the need for big data in recommender systems. This paper provides a study and analysis of quality parameters of recommendation systems for LBSN with big data

    Revisiting Fully Homomorphic Encryption Schemes

    Full text link
    Homomorphic encryption is a sophisticated encryption technique that allows computations on encrypted data to be done without the requirement for decryption. This trait makes homomorphic encryption appropriate for safe computation in sensitive data scenarios, such as cloud computing, medical data exchange, and financial transactions. The data is encrypted using a public key in homomorphic encryption, and the calculation is conducted on the encrypted data using an algorithm that retains the encryption. The computed result is then decrypted with a private key to acquire the final output. This abstract notion protects data while allowing complicated computations to be done on the encrypted data, resulting in a secure and efficient approach to analysing sensitive information. This article is intended to give a clear idea about the various fully Homomorphic Encryption Schemes present in the literature and analyse and compare the results of each of these schemes. Further, we also provide applications and open-source tools of homomorphic encryption schemes.Comment: A quick summary of Fully Homomorphic Encryption Schemes along with their background, concepts, applications and open-source librarie

    LATENT SEMANTIC INDEXING USING EIGENVALUE ANALYSIS FOR EFFICIENT INFORMATION RETRIEVAL

    No full text
    Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections
    corecore